Intro to NGS processing
James A. Fellows Yates
2021-08-17
Who am I?
- Education
- B.Sc. Bioarchaeology (University of York, UK)
- M.Sc. Naturwissenschaftliches Archäologie (University of Tübingen, DE)
- Ph.D. Archaeogenetics (MPI-SHH / MPI-EVA, DE)
- Experience
- Number of genetics classes taken: 0
- Number of bioinformatics classes taken: 0
@jfy133
Today we will
- Introduce what DNA sequencing is
- Explain how Illumina NGS sequencing data is generated
- How to evaluating NGS data [Practical]
What is DNA?
Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. - Wikipedia
What is DNA?

What is DNA?

The rules
- Four nucleotides
- Pyrimidines:
Cytosine, Thymine
- Purines:
Guanine Adenine &
- Base pairing: one pyrimidine with one purine
C with G (think: CGI)
A with T (think: AT-AT walker)
- Complementary
C on one strand, G on the other (or v.v.)
A on one strand, T on the other (or v.v.)

The rules
- Make copy of a DNA strand with a polymerase
- Unwind the DNA
- Separate the strands
- Make new strand: find a
C, get new G (etc)
How do we get DNA?

Introduction to DNA Sequencing
What is Sequencing?
Converting the chemical nucleotides of a DNA molecule
to
ACTG on your computer screen
Historically

- Separate strands, add primer (starting point)
- Add mix of nucleotides, some with special ‘terminators’
- Pass through size-filtering, read order of terminators
Pros and cons of Sanger Sequencing
- Pros
- More precise (less errors)
- Longer reads
- Cons
- Resource heavy: lot of input DNA
- Slow: one. fragment. at. a. time.
What is NGS?
- NGS: Next Generation Sequencing
- MASSIVELY multiplexed!
- Sequence millions and even billions of DNA reads at once!
Not really ‘next’ anymore, consider it more ‘second’ generation (see: Nanopore)
What is NGS?
Market leader: 

(Others: Roche 454, PacBio, IonTorrent etc.)
How does it work?
- Basically same concept, but:
- no size separation
- with pretty pictures!
i.e. attach florescent nucleotides, (normally) one colour per base
A
G
T
C
Fire mah lazer, and take a picture! Rinse and repeat!
Where does this happen?
On a ‘flow cell’

Where does this happen?
But how do you get your DNA to attach to the lawn?
- Convert it to library:
- Add adapters
- Add indexes
- Add priming sites
AATGATACGGCGACCACCACaccgacaaCCCTACACGACGCTCTTCCGATCTXXXXXXAGCACACGTCTGAACTCCAGTCACgacactaCCGTCTTCTGCTTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTACTATGCCGCTGGTGGTGtggctgttGGGATGTGCTGCGAGAAGGCTAGAXXXXXXTCGTGTGCAGACTTGAGGTCAGTGctgtgatGGCAGAAGACGAAC
Sequencing-by-synthesis
Once attached, make lots of copies (clustering)
Sequencing-by-synthesis
Separate, add primer
Sequencing-by-synthesis
Add the florescent nucleotides, only complement will bind
Sequencing-by-synthesis
Fire the lazer, and take a photo
Improving quality
Throughout limits
Paired end
Paired end sequencing
Once end, bendover, attach other end (turnaround) and start from the end of the molecule
Cons of NGS sequencing
- less accurate (laser/photo can get wrong)
- chemistry limits (DNA strands gets old through heat cycling for denautring; dirty environment from suboptiomal wash steps etc.) mean short reads (compensated by volume)